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0 A orcgran*' execuimg on a first processor in an 
MP configuration awaiting the release of a resource 
held by another processor, detects the expiration of 
a fixed time interval, and initiates a hierarchy of 
recovery actions designed to cause the resource to 
be freed. These actions, targeted at a processor 
believed to be the one currently holding the re- 
source, are taken only if that processor is not ex- 
ecuting an "exempt" routine. The actions, taken in 
order of increasing seventy, are: wait for a second 
fixed time interval; terminate the routine on the 
resource-holding processor, allowing retry: terminate 
the routine without allowing retry; invoke Alternate 
CP Recovery. The hierarchy is escalated against the 
target processor until that processor releases the 
resource, and against other processors in the con- 
figuration until the resource is acquired by the first 
processor. These actions may proceed in parallel for 
multiple detecting and target processors within an 
MP environment. 
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0 Systematic recovery of excessive spin loops in an n-way mp environment 



0 A program executing on a first processor in an - >t^TE? 

MP configuration awaiting the release of a resource ' - - -o 

held by another processor, detects the expiration of t :: — — — — — 

a fixed time interval, and initiates a hierarchy of — ; — : > 

recovery actions designed to cause the resource to 

be freed. These actions, targeted at a processor =^ » 

believed to be the one currently holding the re- ^ 

source, are taken only tf that processor is not ex- » - -*> ^ 

^ecuting an -exempt" routine. The actions, taken m > * 

order of increasing severity, are: wait for a second 

fixed time inten/al; terminate the routine on the t 

r2 resource-holding processor, allowing retry; terminate '»'■* 
the routine without allowing retry: invoke Alternate ^ 

^CP Recovery. The hierarchy is escalated against the 

S target processor until that processor releases the 



'resource, and against other processors in the con- : j r / ^. 

^figuration until the resource is acquired by the first t ! r^^n^^'"^ 

processor. These actions may proceed in parallel for 
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tU multiple detecting and target processors withtn an 1 ^' 

MP environment 
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SYSTEMATIC RECOVERY OF EXCESSIVE SPIN LOOPS IN AN N-WAY MP ENVIRONMENT 



r-.rr'icn 'e.ates :z -re ':e-c cr sysrerr-; 
::'"z c.':c'3rr:r*!rg. Mere scecificaiiy. 't ^e'ates 
-r^z^ansr^-s 'zr le-eciir.g ana recovsrtr.g 'rrrPi sen 
czz SiXar.rrs n rr.-tticrocasscr system conricura- 

A iz.r. ccc iS a :cra:ticn whicn cccurs in a 
mu:* cricrsscr iMP' sySiem wnen a rcunne sxecut- 
:rG cn one Central Processor (CP) :s unabie to 
c-rr.oiete a :u notion Cue to a aepenaence on sorr.e 
act:cn zemg taKen on another CP. If the function 
must te ccmcietec cefore further processing can 
be cerfcmea. tne routine may enter a ioop anc 
scin waiting :or ire required action to be taKen on 
the ether CP 

Spin ;cccs typically occur in systems such as 
MVS.XA anc MVSESA when a system routine is 
attempting :o perform one of the following func- 
tions: 

1. Communicate with other CPs • For exam- 
ple, when an MVS system routine running on one 
CP cetermines that an address soace should be 
swaoped out of mam storage, it is necessary to 
notify ail other CPs to purge their translation 
lookaside ouffers of addresses reiated to that ad- 
dress space. This is accomplished by issuing a 
SiGP (Signal Processor) Emergency Signal to the 
other CPs. Until each CP responds with an indica- ■ 
tion that :t has cerformed the required purge, the 
initiating MVS routine will enter a spin loop to await 
completion of the required action. 

2. Senaiization of function across ait CPs • 
MVS uses system iccks to serialize execution of 
many f'onc::cr,< across ait of the CPs in the system. 
This IS -ecessary '.o ensure the integrity of the 
ooeraticn ce:rg performed. The general locking 
arcnitect'jre jsea in the MVS system is described 
in the 3M Technical Disclosure bulletin. Dec. 
1973. Volume i6, No. 7, at page 2420. As an 
example. :f an MVS routine on one CP wishes to 
process :ne results of an I/O interrupt from a de- 
vice, it must ensure that status about the interrupt 
is not inaovertently corrupted by a system routine 
on another CP wishing to initiate a new I/O opera- 
tion to the aevice. This is accomplished via the use 
of a system lock per device. If a system routine 
requires the lock for a given device which is owned 
by a routine on another CP. it will enter a spin loop 
until the lock becomes available. 

Spin loops are a normal phenomenon of an MP 
system. They are almost always extremely brief 
ana non-disruptive to the operating environment. 
However, when their duration becomes excessive, 
spin loops become a problem which requires re- 
covery action to resolve. In the prior art. those 
actions were determined and performed by the 



system rcerator. 

E.xcessive scm :ccc iE3L; :crc'.t:c."s :ir Zr 

incgerec for a -vice vanery of la-jses. Fcr -^xam: -r 

the CP wncr IS rciCirg a .'■escLrce recL ^. '"r 
s routine 50ir.?irc cn 3rc:."er CP mav i:e, 

0 E.xcer:er.c:nc a "sroware 'aiitre 

0 E.xoenencing a software fanure 

0 Performing a critical ^urcticn .vricr taKes 

unusually long penod of time to ccm.piete 
■0 0 Stopped Dy tne operator or cy '.re zzenvz 

system 

in the past, the MVS operating svs'.e.- re- 
jected tne existence of an ESL and HLrraiec xe 
concition to the system ooerator. The 3er5c::or 

rs was oerformed by the routine in the scin iccp. after 
spinning for a full ESL timeout interval, wmch was 
approximately 40 seconds m MVS. it then invCKea 
the Excessive Spin Notification Routine, to issue a 
message to the operator requesting recovery ac- 

20 tion. 

Determination of the correction recovery action 
to resolve an ESL condition is complex, error- 
prone, and especially cntical given the severe m- 
pact such a condition has on the operating system. 

25 Due to the frequency of inter-prccessor commu- 
nication and cross-CP resource serialization in an 
MP environment, when one CP fails, all other CPs 
very quickly enter spin loops until the problem on 
the failing CP is resolved. 

30 According to the prior art. there were three 

recovery actions that an operator can take when an 
ESL occurs. Eacn has benefits and drawbacks as- 
sociated with It. The actions are as follows: 

1. Respond to the ESL message to continue to 
35 sDin on the oetecting CP for another excessive 

spin loop interval. 

This will only have benefit if the cause of the 
spin loop is temporary, i.e., if it is due to some 
unusually lengthy but legitimate processing on the 

40 CP causing the condition. 

The problem here is that neither the operator 
nor MVS knows whether the condition is temporary 
or not. If the operator does not respond to continue 
the spin and instead performs a recovery action. 

45 the possibility exists that an important MVS system 
function will be the target of that destructive recov- 
ery action. This may even result in an unnecessary 
system crash. 

On the other hand, if the operator does decide 

50 to continue the spin, how many times should the 
spin be allowed to repeat before taking a more 
forceful action? Each response to continue in the 
spin toop further prolongs the time that the system 
is unavailable. 

2. Respond to the ESL message to trigger the 
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Zr *rs zererai ACR ''.rc:icn rescricea 
:n 'EM 'rcr.r.icai I;s;:cs'-re Eui^eiir. Ncv :9*3. 
.:i„.Te *o. 'Nc. 5. a- cage 2CC5 Tre a:ccritr,m 
-36 2 ze-ermire -vn::.-: s :r.e 'a;itna CP in an N- 
.va/ erv':nrrert s :esc:*cec :9M "ec^ricai 
l!sc:csure rutletin. Jdv :963. Vciume IS. Nc. 2 at 

Tr:s :a'jses :re -ecDvery routines crctecting 
—e orrgrarn rLnmrg on :re ''aiiirg CP tc oe in- 
.c<ed. This 'S cone \z aiicw :he recovery rcttines 
to release resources "e'd an tne faiiing CP whicn 
T.ay ce recuirec cy :he CP c-rrently in a som loop. 

The rrawoac.K of tnis action :s mat it aiso 
-esuits in removinc t^e "faiiing" CP from use by 
re MVS cceraiing system. Excerence nas snown 
:r:at excessive sctn icccs are usually caused by 
ncn-CP related hardware or software errors. Tne 
recovery orccessing associated with ACR may re- 
solve the scin loop but removing the CP from the 
configuraiicn is higniy cisruptive and also unnec- 
essary in the majonty of spin loop scenarios. 

Even with a mghlySKilled operator, who deter* 
mines and performs each recovery action after only 
30 seconds delay, the system is completely un- 
availaole for several minutes, in addition, the CP is 
unnecessarily removed from system use for an 
unaetermined period of time. 

Another drawback of the ACR action can be 
that recover/ routines are allowed to retry after 
being invoke :. Therefore, the ability of the ACR 
action to resolve the spin loop and avoid a system 
outage is highly dependent on the effectiveness of 
the recovery routines protecting the failing pro- 
gram. If the recovery routines do not release the 
resources -eouirec oy the CP in the spin loop, or 
retry oacn to a ccmt m the failing program which 
causec ".re prooiem to begin with, the spin loop 
conciucr^ will not be resolved. 

3 Respond to the ESL message to continue 
the scin on the detecting CP AND initiate a RE- 
START from the system console to interruot the 
rcuiine executing on the failing CP. This action will 
trigger invocation of recovery routines to force the 
re.ease of resources held on the failing CP. 

The drawback of this action is that it results in 
termination of the current unit of work because 
recovery routines are not allowed to retry when 
RESTART is invoked. Thus, even though the re- 
covery routines may be able to successfully re- 
solve the problem causing the spin loop, the pro- 
gram is forced to terminate. If a critical job or 
subsystem is active on the failing CP when the 
spin loop is detected, invocation of RESTART will 
cause loss of that critical subsystem and perhaps 
require re-iPL of the system. 

Another drawback is that the RESTART proce- 
dure is more complicated than simply responding 



v: a Tirssace arc -5 trere-T-e z z'- 
■rrrcr. 

Mcsi ESL :orc:t;cns. :,e t: loe-i::- - • 

inaaeqLate '::overy cc::crs. -z^c --^ :r i i.i:-- 
z crasn arc an extencec outage -eoLir-r; 

In acc:iicn to tre ocnc e.vties or ;re -i::.^-. 
oecisicrs 'eauirec by tne ooera::r tc -ezz-ir --t — 
an ESL ccnciticn. tne rrecnartcs ;f r'fct": 'i: 
recovery cecome Signiricart'y -T^cre trvciv-e-: ' 
'0 coeratcr is urabie to answ^^ re iz-r zcz -fi- 
sage ana instead must resccnc to t.ne sc:r^ :oo 
restartacie wait state, rcr e^a.^-cie. :cr =n ac.=^. 
response, the ocerating procedure mvcivei: 

1. Stooping a!i CPs m t.-^e system 

2. Storing the ACR -esocnse m ^a^o itrrHoe 
on the detecting CP iwnicn may ce in a-.rn :: 
tne installation's policies) 

3. Starting all the CPs excect t-e oetT:t:.-g 
and failing CPs 

20 4. Restarting the detecting CP to initiate re- 

covery. 

SUMMARY OF THE INVENTION 

25 

The present invention is a system and orccess 
in a multiprocessor system environment, for detect- 
ing and taking steos to automatically recover from 
excessive spin loop conditions. It comprises :unc- 

30 tions and supporting indicators that clearly identify 
true spin loop situations, and present a nierarcnical 
senes of recovery actions, some new to the ESL 
environment, that minimize the imoact of the con- 
dition to the multiprocessor system, and its wcr- 

:5 kload. 

It is an object of the present invention to pro- 
vide an automatic and efficient mecnanism for de* 
tecting and recovenng from excessive SDin !cod 
situations in an MP environment. 

40 It is a further object of this invention to recog- 

nize persistent, related spin looo situations m an 
MP environment, and recover automatically from 
them. This includes recovering in parallel from mul- 
tiple ESL occurrences involving more than one 

45 failing CP. . 

It is a further object of this invention to present 
a hierarchy of recovery actions representing pro- 
gressively more severe actions, so that a severe 
action is taken only when a less severe action has 

50 failed to resolve the problem. 

BRIEF DESCRIPTION OF THE DRAWINGS 

55 Ftg. 1 is a linear time flow diagram showing 

an overview of the Excessive Spin Loop Recovery 
(ESLR) Function operating in a 2-way MP environ- 
ment. 



3 



EP 0 351 z2q A2 



r ^. 3 s a xncticn fio-.v ciacrarr: srcwirc :re 
r. r'Hr:."v :f .-ecovery aciicns :=.'\en -vttnin E5LP 
_ - - - - - 3 

- 'S 5 :irear time fic-.v :iacram shcv/irg 
z iZz-'ar.c r -vnicn ESLR orccs55ing is lS5C :: 
'■rs: -.e £ si'i* 'ccc leaciccx iituation m a 5-way 
*.:P -rrvircr.r.ert. 

- cure 1 sro'.vs an envircnment m wmcn an 
r-cc:t.""ert or :ne oresent mvertion cperates. It 
t!iu5:ra:es a 2-way MP system ccnsisiing of Central 
P'ccesscr i MO) ano Central Processor 2 
Cer.'ji: Przcassor i. navmg cDtamed spm-type lock 
X a- :0 (lOD. subsecuently enters a aisaoted 
loco i*02): Central Prccessor 2. reauesiing spin 
locK < a: time to i (ViO), is unable to ootain it. 
ana 30 'spins", periodically re-reouesting the lock 
{1 1 n. 

As with systems of the prior art, it is the 
resocnsibiiity of tl^e processes which have request- 
eo a sDin-type lock to determine that a "'ong** time 
has etapsea since the lock was requested (a time 
interval referrea to as the SSL or Excessive Spin 
LccD. :nter/al); having recognized that this period 
of lime has eiaosed. (112). the requesfing proces- 
sor invokes the Excessive Spin Loop Recovery 
(E3LR) processing of this invention (113). This 
processing aitimately resuits in the release of the 
lock by orocessor i (103). and allows the subse- 
quent accuisition of the took by processor 2 (114). 

Referring to figure 2. excessive spin loop re- 
covery processing is entered when the CP request- 
ing the iock detects that it has been waiting for the 
Icck for an excessive amount of time. On entry, this 
routine checks to oetermine whether excessive 
spin loco recovery processing is active on any 
other C? in the complex by cnecking the CVT 
glooal control block (24) via the atomic "Test and 
Set" operation. If the answer is yes, there is an 
immediate return and this indication is not treated 
as a detection of an excessive spin loop. 

If the answer is no. the failing CP is identified 
as indicated in the aforementioned TDB (Vol. 26. 
No. 2. at p. 748), and the identity of the failing CP 
is saved. A check is then made to see whether any 
spin loop recovery action was taken for this failing 
CP within the last excessive spin loop interval. If 
so. subsequent recovery processing is bypassed. 
In tightly<oupled MP systems of three of more 
CPs. this is done because two different CPs could 
enter ESLs against the same failing CP within ttie 
same interval. When the first of these two ESLs 
results in a recovery action, the second ESL must 
be prevented from initiating another (more disrup- 
tive) action before the first one has a chance to 
complete. 

The Excessive Spin Loop Recovery Processor 



■ESL."-' Tcir.rains a :3C.e :r ; :ca. 5::r3ce " : 
•re :::"€ :r xe -as; E5L 'rcc-'r''." i": i-^' 
scams: escr ZP Tns Last ^z:icn "s^^r 
Tacie -25'^ *gS ore ertrv cer IP EBL.~. tre-" 

5 cares :"e -:::k .a;L€ :r ertrv -vitn *"e 

•'cr :re 'aiiirg ZP If an ESL nerval "as nc: :=£ie: 
Sines :re ast act-zn acairst :his ratimg CP -:• 
2ct:cn :s taKen. Hcweve.'. :re 'ast ce:ec::cr. :""e 
LASTDT 23) :i€:C s -caa:ec recause ir\s cetec- 

:: ticn T.'js: t:e reccrcec :o e.nsjre :re crzcer ze-er- 
rr.iration cf a cersister.: procier.. "^^-e :icck v3::je 
(S again cotamed arc :r.en stcrec m re 2'cza\ E5'* 
rieia ;2S), mcicanng that :nis retecjon s :rea:ec as 
a giocat aetection. anc :re rouiine returns :o :re 

15 caiier. 

If no action was taken fcr this CP withm :-e as: 
ESL interval, a check is mace to see if an ESL was 
detected against any CP within the iast ;wc ESL 
intervals (23). 

20 The question here is whether two consecutive 
(ESL) occurrences recresent repeated manifesta- 
tions of the same problem (i.e.. a persistent prob- 
lem) cr whether each ESL occurrence represents a 
separate prcolem. If an ESL is identified as occur- 

25 nng for a persistent problem, the recovery action 
for that ESL will be the next one in the series of 
increasingly severe actions for that particular 'aiting 
CP. 

If an ESL is determined to be the initial mani- 
20 festation of a problem, all the ESL indicators for all 
CPs are reset so that any sequence of actions for 
any CP starts at the first action. 

The Excessive Spin Loop Recovery Processor 
(ESLR) maintains a field (LASTDT) (28) m global 
J5 Storage showing the time of the last detection of an 
ESL against ANY CP. 

A persistent problem exists if: T-b^STDT < 
2xESLl wnere: 

T - time of this entry to the ESL Recovery routine 
JO ESLl = excessive spin loop interval. 

When processing of this ESL is complete, LASTDT 
is upoated with the current time at exit from ESLR 
process. 

Given that time between entries to ESLR from 
45 a given spin routine is equal to ESLl plus a very 
small delta consisting of linkage time from the spin 
routine to the ESLR process, it follows that the spin 
routine will continue to call ESLR in less than two 
spin loop time-out intervals until it has obtained its 
50 acquired resource. However, a given invocation of 
ESLR may be locked out if another CP has already 
senalized the ESLR function. Therefore. ESLR 
must be cognizant of all entries to ESLR from any 
CP. If no entry to ESLR occurs from any CP for 
55 two or more spin loop time-out intervals, then it 
follows that ALL spinning routines obtained ALL 
their desired resources subsequent to the last call 
to ESLR. 
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'a.; ''C ZP :s -cc: exec'jtir.g a 'CLti-e Tet -5 
r<6'^cTec -'cm excessive scm iccc reccvery crc- 
rasstrg Mr.c:rateG ir :ne lCCA ciock <2"'Vj. - 
rr.ecrar:srr. provcirg sn exerritic-n ;s 

■ ecuirea cecause rhere are egitimate systen rou- 
tines .v-icn rct'iC rtnerwise ingger E5L :oncitions 
rec3'jse :^"e :ime 'o :cr*C'ere tne rune tic n exceecs 
:re ESL :\rr,e-CLi va.ue. This ai'ows t^e system 
r:u*ines to set an troicatcr arcunc erctry 
fL;nc::cr t a fietc crecKed by the ESL recovery 
crccess. This exemoticn mecnanism ailcws trie 
ESL interval tc be reduced far fceiow its vaiue in 
crevicus MVS systems of ^ seconcs to signifi- 
cantly irrprove ESL -ecovery performance, it elimi- 
nates tne neea tc spin for such long oenccs to 
avctd an ESL oetection and recovery action for a 
lecitirrate, temcorary condition. Some MVS func- 
uons induced m this validly exem.oted category are 
those wnicn load restartable CP wait states for 
operator communication, place a CP temporarily in 
a stocoed state, or communicate with the operator 
via oisaoied console comimunication facility. 

If tne failing CP is not executing an exempt 
routine, recovery action is initiated for that failing 
CP. This recovery action processing is further de- 
scribed in Figure 3. Having taken the appropriate 
recovery action, the current clock value is placed in 
the ^J^T field {26) of the failing CP and the global 
ESL field (LASTDT (23)) and return is made to the 
caller. 

Referring to Rgure 3. on entry to recovery 
action orocessmg an index is incremented asso- 
ciated with the failing CP. A check is then made 
against the value of the index. If the vaiue equals i. 
a return is maae to the caller. This results m a 
continuation of spinning on the desired lock for 
another ESL interval, it is imooaant to watt for this 
additional ESL interval since it is possible that a 
call may have been made to excessive soin loop 
recovery processing in the window of time between 
the cleanng of the exemption flag and the enabling 
of the associated CP and in this case no disruptive 
recovery action is desired. 

If the index is equal to 2. an indicator is set in 
the CVT control block indicating ABEND as the 
recovery action. A Signal Processor instruction in- 
dicating restart is then issued to the failing CP to 
give control to the restart FLIH. Return is then 
made to the caller. On the failing CP the RESTART 
FLIH checks the CVT indicator and sets a flag 
indicating the ABEND action and passes control to 
the Recovery Termination Manager to execute the 
ABEND action, which allows the recovery routines 
to retry after performing any necessary clean up. 

If the index is equal to 3. the CVT flag is set to 
indicate the TERMINATE recovery option. A signal 
processor instruction indicating restart is then is- 



~erT;rai:on Ma.f^acer :c c^ c" *- **' " i: 1~ 

"re TE.-'.MINATE cc^icr zi^rr.'; -5r\~ 
ccncn iP ;rat it cces rot arc a* -tC ~.i •: 

5 'etrv. Resources cvvrec cv *.'-e :a..rc :• 

are released, anc :he ^rn ci -vctk .s z-zzz *: 
terminate. Return ;s tre*^ -n^ace tc tre ca.le^ 
if re T.cex *s ectai to A *err£ie C= 
ery lACR) is mitta-ec fcr re •a!:ing ZP - v- 

■Q ation (S erfecrec cy :re :e'5c:""C crcessc i"*-- 
'ating the receict of a .ma furcticp a;en '"te.'"„cvc" 
from the failing CP wncn initiates act:crs 'rsut'^-c 
in taKing this CP oft-'ine. 

^5 

5-WAY EXAMPLE 

Figure ^ illustrates E.xcessive Scm Lcco Re- 
covery processing active m a 6-WAY MP system. 

20 with two independent excessive spin loops: the nrst 
involves CPs 0. 1 and 2 all waiting for a resource 
held by failing CP 3; the second involves CP 4 
waiting for a resource held by failing CP 5. The 
example shows: 

25 1 . Simultaneous resolution of indepencent 

ESLs 

2. Correct progression through the hierarchy 
of recovery actions for each ESL taking ;ncreas- 
ingly severe action when previous action faiiea tc 

30 resolve the problem. 

3. Pacing of actions taken for related ESLs 
(multiple CPs soinmng on the same failing CP). 

At times. T, T + 2. and T + 3. the waiting CPs 
(0, 1.2 and 4) request the needed resource of CP 

35 3 or 5. At T + lO. CP 0. noticing that an ESL 
interval (here. tO seconds) has elapsed without 
obtaining the resource, calls ESLR processing, 
which sets the CP 3 index to i and saves the time 
of this ESLR processing (T + iO.I) in the LAT field 

40 for CP 3 {figure 2B at 26). and LASTDT ^28). and 
then returns to the caller who continues to soin (as 
indicated in figure 3. since this is the initial detec- 
tion). At T + 12. CP 4 detects an ESL. calls ESLR. 
which sets the CP 5 index to i and saves the time 

45 (T + 12.1) in lAT entry for CP 5 (26) and LASTDT 
(28), and then continues to SPIN (fig. 3). Simulta- 
neously at T + 12, CP 1 detected an ESL. and 
invoked ESLR - which immediately returned since 
ESLR was already active on CP 4 (see fig. 2A at 

50 21). At T* 13. CP 2 detected its ESL called ESLR, 
which takes no recovery action since one was 
taken for this failing CP (CP 3) within the last ESL 
interval (see fig. 2A at 22). The time (T*i3.l) is 
saved in LASTDT (28). At T*20.l. another ESL 

55 interval having passed for CP 0. ESLR is again 
invoked: since no action was taken for failing C? 3 
within the last ESL interval (T^iO.l -T*20.i) {see 
fig, 2A at 22), a recovery action is taken, tne mcex 
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" r --'-'^f t^- \Z 2 ' -■ '^IV B.r.Z 

i £.cr.a.:ec ::■ Z? 3 32). *"e ::.'"e 
■~-IC2' s =a"vec -f*^ lAT fcr 3P3 i25). anc 
--S'ZT -25) A: T-22. CP : rerects :"e 

r-*C ' = ::r^ arct"er E3L iniervB:. cails 55LP.. 
•*.'':cr la-Kr; r.c =c::cr3 s:rc3 ac:;cn .vas taken fcr 
:= ] .vi:h!n re rast ESL .r.rervai (fig. 2A at 22^ 
--e :!-e T-22,n •$ savec m LASTDT (23). Also 
a: ~-22.* Z? ^ :r:e::s r.e exrira::on cf an ESL 
-re-vai. cails E51R. v;n;cn ;mm.ec:ateiy re turns 
5irce ESLP. 'S aireacy running cn CP i ffig. 2A at 
211 A; r -23.1. CP 2 notes tne passing of an ESL 
irterva:. :a::s ESLR. '.vhtcn :3Kes no action since 
acncn was taken for CP 3 within tne last ESL 
■rrar/Si 'r:c. 2A a: 22V The time (T + 23.2) is saved 
:n L-.5"CT (23). At rime 1^30.2. CP 0 detects tne 
passage of anctner ESL interval (the ABEND sig- 
nalled to CP 3 at Tr20-2 has not resolved the 
prcDiem cn CP 3). calls ESLR. which, since no 
action was taken for CP 3 within the last ESL 
;ntef"/ai. increments CP 3's inoex (fig. 3 at 31) to 3. 
then signals "Terminate" to CP 3 (33). Time 
(T + 30.3) is saved m the LAT entry for CP 3 (26) 
ana in LASTDT (28). Note that in this example, the 
Terminate action against the unit of work on CP 3 
resolves the spin loop on CPs 0. f and 2. At 
T-^32.1 CP 4. detecting the expiration of another 
ESL interval (T-^22.1 - T + 32.1) calls ESLR. ESLR. 
realizing that no action was taken for CP 5 within 
the last ESL interval (T + 22.1 •T + 32.1: LAT for CP 
5 IS TV12.1). but there was an ESL detected 
against some CP within the last two ESL intervals 
(fig. 2A at 23). ESLR increments the maex asso- 
ciated with CP 5 to 2 (fig. 3 at 31) and signals 
ABEND to CP 5 (32). The time (T + 32.2) is saved 
in LAT for CP 5 (26). and in LASTDT (28). in the 
example, the ABEND action against the unit of 
work on CP 5 resolves the spin loop on CP 4. 



Claims 

1. In a multiprocessing system complex com- 
prising at least two processors, an operating sys- 
tem, and resources shared among processors, a 
method for recognition of and recovery from exces- 
sive spin loops by the operating system compris- 
ing: 

A) detecting, by a detecting routine in a first 
processor, that said first processor has been in a 
spin loop requiring a resource held by a resource- 
holding routine in another processor for a fixed 
time period; 

B) identifying a target processor in said sys- 
tem complex as a target for responsive recovery 
action; 

C) performing no responsive recovery ac- 
tions if a bypass indicator set by a routine in said 



£'j:-n=::c3!!y cer:rT;r: n.z 
crccesscf :r.e of a nierarcr.icai iec-f.'ie -r- 
sccrsive crcgra.T.T-.ec recc-.-ery =:::crs ' - 

5 rass inc;catcr is ;rf: 

:rrtinui.".g :o .certify satc :=.';=: I'z ■: 
perform sucseotent r.ierarcnicai 'eccvery 
rcr saio -arget rrccessor jr:-! saio -r-jres;-- 
IS no icr.ger 30 ice.^t:fiec as satc :=.'':et; 

:o r) ccntiruirg !o 50 zeiec: :re -z z.r-z y i"-. 

cf saic .-escurces 'cr saic nxec ::re cer cc £nc :c 
iceniify target prccesscrs anc cerfc" ;=r;e: 
processor-specific hierarchical reccvery £c::crs -p- 
til all cf saic resources are accuirec oy ai; :e:6Ct- 

■5 ;ng processors. 

2. The metnod cf c.aim i :n .vhicr. a ii.cse- 
quent one of said recovery actions in saio -terar- 
chical seauence is performed for saio target pro- 
cessor only if an immeaiateiy preceding one of 

20 said hierarchical recovery actions has oeen cer- 
formed for said target processor :cnger ago znan 
one of said fixed time penoos. 

3. The method of claim 2 in which said subse- 
quent action in said hierarchical sequence is per- 

25 formed if there has been said detecting of one of 
said spin loops requiring one of said resources 
held by any of said processors in said muitioroces- 
sing complex within two of said fixea time penods. 
and in which an initial one of said hierarchical 

30 . actions is performed otherwise. 

4. The method of claim 3 in wnich saic hierar- 
chical seauence comprises the action of aonor- 
malty terminating said routine in said target proces- 
sor in a manner that permits a resource-noloing 

j5 routine in said target processor to resume normal 
execution after cleanup. 

5. The method of claim 3 in which said hierar- 
chical sequence comprises the actions of: 

A) continuing to wait for said resource to be 
40 released for a second fixed time period; 

B) abnormally terminating a resource-holding 
routine in said target processor in a manner that 
permits said routine in said target processor to 
resume normal execution after cleanup; 

4$ C) terminating said resource-holding routine 

in said target processor in a manner that does not 
permit said routine in said target processor to re- 
sume normal execution; 

0) removing said target processor from said 

50 multiprocessor system complex. 

6. The method of claim 3 in which said hierar- 
chical sequence comprises the following actions, in 
the order listed: 

A) continuing to wait for said resource to be 
55 released for a second fixed time period; 

B) abnormally terminating said resource- 
holding routine in said target processor in a man- 
ner that permits said routine in said target proces- 
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cic re5CLr:e--C'.:ir..: rcutir- rvcKeC .n :re :rcsr E 



orccesscr. : :r,e 
.'CK5C ess recent!/ :rar 



' :=:c :3r::ei cr:c=ssor ir a T.arrer :r.=t :ces rci 
s^^:: i3:c rcu::ne r saiG rarce: ;:.':ce£=cr :c .'e- 

.-rn^ai exe--:icr: ^ = :cent:fiec :arce: crrcesscr 

rerrcv.rg saic target zrccasscr r-m sa.a n. The rr^ecnanisrr :r : ^. 

^. ^-^r^esscr svsier^ ::r-=iex. P-'S<rg rrears for :aus.rc a su::e£S.-.e 

7 i-. = T.jf:r':c2ss.nc svstem rcnc.ex ccr-- cr saic sc<r. coc ^ixec :=--e :er-:c :c. 

.-c- ^ = st -wo :rcc3sscrs. an ocerating sys- cetecticn to -nvcKe a -cuerra. -".er-, 



and resources snarec arr.cng prccesscrs. a •:• r'or saiG :c!entifiec :a::et crcesscr 
-ecrarisrr^'rcr recccmticn of ana recovery from cess:ve oetecticn recurs .v':-:r. 2 ^x-: 

evcessive 30>n :cccs oy the ccerating systerr^ ccm- vais of saic onor cetecticn 

cnsing: 

A\ oetecticn r.ears for detecting that a first 
crccessor ras ceen m a scm loop recuiring a :5 
-esoL-rce neid oy a routine m a seconc orocessor 
•or a fixec nrr.e perioc; 

3) icentificaticn .-neans for identifying a tar- 
get processor m saio system complex as a target 
for resocnsive recovery action wren said cetecting 20 
means cetects said spin loop: 

C) a processor-bypass incicator associated 
with each of saia processors ana having an "on" 
setting and an "off" sening. said bypass indicator 
ceing set to said "on" setting when an exempt 25* 
routine is executing in said processor associated 
with said "on" bypass indicator: 

0) responsive recovery means for freeing 
said resource held by said target processor only if 
said crccessor-byoass indicator associatea with 
saio target orocessor is "orf*. 

3. The mechanism of claim 7 in which said 
resconsive recovery .-neans compnses a merarchi- 
cal set of recovery runctions. wnich further com- 
cnse an AEENC-*r:cgering function for causing a :5 
'esou^:e-'-::c:rg -cutine executing in said target 
^j,..^^,-, acrcnaily terminate, allowing retry. 

r ~'~ T^ecp.an:£m of claim 7 in which said 
rri::-i '6 -ecovery means comprises a nierarchi- 
ca' :f recovery functions, said functions com- ^ 

' ^) a spin function for permitting said first 
c:c:353cr to remain m said spin loop for a second 

:.xec :;me oeriod; 

3) an ABEND-triggering function for causing ^5 
a '=£curce-hoiding routine executing in said target 
:rocessor to abnormally terminate, allowing retry: 

C) a TERMINATE-triggering function for 
causing a resource-holding routine executing in 
said target processor to terminate without retry: 

0) an ACR function for removing said target 
processor from said multiprocessor system com- 
plex. 

to. The mechanism of claim 9 further compris- 
ing means for causing successive detections of ss 
said spin loop fixed time periods resulting in iden- 
tification of the same target processor or a different 
target processor to cause invocation of one of said 
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GET LOCK X 



DI5A3LED LOC? 



Lo*22 RELEASE LOCK X 



102 



03 
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REQUE3T LC'CK X 



SPIN 



EXCESSIVE SPIN 
DETECTED 
3 I 



/// / / / / / 

/excessive spin / 
\ recovery 

/ / (SEE FIG. 2) / 
/ / / / / ^ 
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NO 



INITIATE RECOVERY 

ACTION 
FOR FAILING CP 
(SEE FIG. 3) 

I 

i 



LAT 



PUT CURRENT CLOCK 
VALUE IN E3L 
FIELD OF FAILING CP 



LASTDT 



PUT CURRENT CLOCK 
VALUE IN GLOBAL 
ESL FIELD 



RETURN 
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ACTION 
PROCESSING 



=<ECOVE=^^ 



i ",\C?<EMENT INDEX i 
i FOR FAILING C? fv. 




NO 



NO 



NO 



YES 



RETURN (SPIN) 



32 




YES 



SET ABEND INDICATORS 
AND SIGNAL"A3END"T0 
FAILING CP 



RETURN 
(ABEND) 



33 ~^ 



SIG? RESTART 
ABEND 




ES 



SET TERM INDICATORS 
AND SIGNAL"TERM"TO 
FAILING CP 



RETURN 
(TERM) 



-SIGP RESTART 
TERM 



INDEX 
=•4 



INITIATE ACR 
FOR FAILING 
CP 



RETURN 
(ACR) 
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